fix: Remove evaluation metric key from schema which failed on some LLMs by jsonbailey · Pull Request #105 · launchdarkly/python-server-sdk-ai

jsonbailey · 2026-03-11T22:40:27Z

fix: Improve metric token collection for Judge evaluations when using LangChain
fix: Include raw response when performing Judge evaluations

Note

Medium Risk
Updates the judge structured-output contract and parsing logic, plus changes LangChain structured invocation to return parsed/raw data and token usage; this could affect downstream integrations expecting the old schema or metrics behavior.

Overview
Judge evaluations now use a fixed structured-output shape. EvaluationSchemaBuilder no longer bakes evaluation_metric_key into the schema; Judge now expects top-level {score, reasoning} and keys the parsed result by the config’s metric key, failing the evaluation when the response doesn’t parse into a valid score/reasoning.

LangChain structured invocations now capture more telemetry and handle Bedrock better. invoke_structured_model uses include_raw=True, returns only the parsed payload, surfaces raw_response, extracts token usage from either usage_metadata or response_metadata, and treats parsing_error as a failed structured call; provider mapping now routes bedrock and bedrock:* to bedrock_converse and injects Bedrock’s foundation provider parameter when needed.

^{Written by Cursor Bugbot for commit a303c3d. This will update automatically on new commits. Configure here.}

packages/sdk/server-ai/tests/test_judge.py

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py

🤖 I have created a release *beep* *boop* --- <details><summary>launchdarkly-server-sdk-ai: 0.16.1</summary> ## [0.16.1](launchdarkly-server-sdk-ai-0.16.0...launchdarkly-server-sdk-ai-0.16.1) (2026-03-16) ### Bug Fixes * Improve metric token collection for Judge evaluations when using LangChain ([f951dac](f951dac)) * Improved raw response when performing Judge evaluations using LangChain ([f951dac](f951dac)) * Simplify judge structured output for improve reliability for judge scores for some LLMs ([#105](#105)) ([f951dac](f951dac)) </details> <details><summary>launchdarkly-server-sdk-ai-langchain: 0.3.2</summary> ## [0.3.2](launchdarkly-server-sdk-ai-langchain-0.3.1...launchdarkly-server-sdk-ai-langchain-0.3.2) (2026-03-16) ### Bug Fixes * Improve metric token collection for Judge evaluations when using LangChain ([f951dac](f951dac)) * Improved raw response when performing Judge evaluations using LangChain ([f951dac](f951dac)) * Simplify judge structured output for improve reliability for judge scores for some LLMs ([#105](#105)) ([f951dac](f951dac)) * Update comments for setting default ([#99](#99)) ([a14761d](a14761d)) </details> <details><summary>launchdarkly-server-sdk-ai-openai: 0.2.1</summary> ## [0.2.1](launchdarkly-server-sdk-ai-openai-0.2.0...launchdarkly-server-sdk-ai-openai-0.2.1) (2026-03-16) ### Bug Fixes * Update comments for setting default ([#99](#99)) ([a14761d](a14761d)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).  --- > [!NOTE] > **Low Risk** > This is a Release Please version/changelog update only, touching manifests and package metadata but no functional runtime code. Risk is low aside from potential release/versioning inconsistencies if any file was missed. > > **Overview** > Publishes a new release by bumping versions for `launchdarkly-server-sdk-ai` to `0.16.1`, `launchdarkly-server-sdk-ai-langchain` to `0.3.2`, and `launchdarkly-server-sdk-ai-openai` to `0.2.1` (manifest, `pyproject.toml`, and `ldai.__version__`). > > Updates changelogs (and `PROVENANCE.md` version snippet) to reflect the included bug fixes around Judge evaluation output/metrics and documentation comments. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 380481b. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup>  --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: jsonbailey <jbailey@launchdarkly.com>

fix: Remove evaluation metric key from schema which failed on some LLMs

f8c6eba

jsonbailey requested a review from a team as a code owner March 11, 2026 22:40

additional properties is required for openai schemas

49f5e2e

cursor bot reviewed Mar 11, 2026

View reviewed changes

packages/sdk/server-ai/tests/test_judge.py Outdated Show resolved Hide resolved

packages/sdk/server-ai/tests/test_judge.py Outdated Show resolved Hide resolved

packages/sdk/server-ai/tests/test_judge.py Show resolved Hide resolved

fix tests

916df2a

kinyoklion approved these changes Mar 13, 2026

View reviewed changes

jsonbailey added 2 commits March 13, 2026 11:38

include raw response and collect judge metric tokens

1d78f8a

fix bedrock models not running properly in langchain

75f75cf

cursor bot reviewed Mar 14, 2026

View reviewed changes

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py Outdated Show resolved Hide resolved

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py Show resolved Hide resolved

jsonbailey added 2 commits March 13, 2026 22:54

fix lint issue

1ed23cf

don't set success

f6a92cc

cursor bot reviewed Mar 16, 2026

View reviewed changes

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py Show resolved Hide resolved

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py Outdated Show resolved Hide resolved

jsonbailey added 2 commits March 16, 2026 12:43

address code review feedback

554121b

lint fix

f7f286a

cursor bot reviewed Mar 16, 2026

View reviewed changes

packages/ai-providers/server-ai-langchain/src/ldai_langchain/langchain_provider.py Show resolved Hide resolved

jsonbailey added 2 commits March 16, 2026 13:10

fix test

b4e3118

simplify the structured output for judges further

a303c3d

jsonbailey merged commit f951dac into main Mar 16, 2026
35 checks passed

jsonbailey deleted the jb/aic-1897/remove-keys-from-evaluation-structure branch March 16, 2026 20:46

github-actions bot mentioned this pull request Mar 16, 2026

chore: release main #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Remove evaluation metric key from schema which failed on some LLMs#105

fix: Remove evaluation metric key from schema which failed on some LLMs#105
jsonbailey merged 11 commits intomainfrom
jb/aic-1897/remove-keys-from-evaluation-structure

jsonbailey commented Mar 11, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jsonbailey commented Mar 11, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsonbailey commented Mar 11, 2026 •

edited by cursor bot

Loading